Search CORE

77 research outputs found

Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes

Author: AB Veretennikov
AB Veretennikov
AB Veretennikov
G Zipf
HE Williams
Justin Zobel
Matthew Chang
S Gugnani
Sergey Brin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Full-text search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity full-text search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance full-text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of search query execution is 44-45 times less than that required when using ordinary inverted indexes. This is a pre-print of a contribution "Veretennikov A.B. Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes" published in "Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868" published by Springer, Cham. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-01054-6_66. The work was supported by Act 211 Government of the Russian Federation, contract no 02.A03.21.0006.Comment: Alexander B. Veretennikov. Chair of Calculation Mathematics and Computer Science, INSM. Ural Federal Universit

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Query-free news search

Author: Brin Sergey
Chang Bay-Wei
Henzinger Monika
Milch Brian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/01/2007
Field of study

Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can be treated as one such stream of text; in this paper we discuss finding news articles on the web that are relevant to news currently being broadcast.We evaluated a variety of algorithms for this problem, looking at the impact of inverse document frequency, stemming, compounds, history, and query length on the relevance and coverage of news articles returned in real time during a broadcast. We also evaluated several postprocessing techniques for improving the precision, including reranking using additional terms, reranking by document similarity, and filtering on document similarity. For the best algorithm, 84%-91% of the articles found were relevant, with at least 64% of the articles being on the exact topic of the broadcast. In addition, a relevant article was found for at least 70% of the topics

Infoscience - École polytechnique fédérale de Lausanne

Copy detection mechanisms for digital documents

Author: Garrett J. R.
Griswold G. N.
Héctor García-Molina
James Davis
Kahn R. E.
Manber U.
Sergey Brin
Wheeler D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Query-Free News Search

Author: Bay-Wei Chang
Brian Milch
E. Brill
J. Budzik
Monika Henzinger
P. D. Turney
P. Hart
S. Brin
Sergey Brin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Discovering gene annotations in biomedical text databases

Author: A Cakmak
Ali Cakmak
Burr Settles
Chin-Yew Lin
Deepak Ravichandran
DV Kalashnikov
E Camon
Ellen Riloff
Eugene Agichtein
G Salton
Gideon S Mann
Gultekin Ozsoyoglu
Jiawei Han
JoonHo Lee
K Asakawa
K Asako
KarenSparck Jones
L Lovasz
Michael Fleischman
Michael Fleischman
Oren Etzioni
Philip Resnik
PW Lord
Roy Rada
S Raychaudhuri
S White
Sergey Brin
Sergey Brin
The Gene Ontology Consortium
Tomonori Izumitani
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO) concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i) automating the annotation of genomic entities with Gene Ontology concepts, and (ii) providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate pattern occurrences with similar semantics. Relatively low recall performance of our pattern-based approach may be enhanced either by employing a probabilistic annotation framework based on the annotation neighbourhoods in textual data, or, alternatively, the statistical enrichment threshold may be adjusted to lower values for applications that put more value on achieving higher recall values.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Theories for influencer identification in complex networks

In social and biological systems, the structural heterogeneity of interaction networks gives rise to the emergence of a small set of influential nodes, or influencers, in a series of dynamical processes. Although much smaller than the entire network, these influencers were observed to be able to shape the collective dynamics of large populations in different contexts. As such, the successful identification of influencers should have profound implications in various real-world spreading dynamics such as viral marketing, epidemic outbreaks and cascading failure. In this chapter, we first summarize the centrality-based approach in finding single influencers in complex networks, and then discuss the more complicated problem of locating multiple influencers from a collective point of view. Progress rooted in collective influence theory, belief-propagation and computer science will be presented. Finally, we present some applications of influencer identification in diverse real-world systems, including online social platforms, scientific publication, brain networks and socioeconomic systems.Comment: 24 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Googling the Grey: Open Data, Web Services, and Semantics

Author: Andrew Baines
Christine Borgman
Cindy Stankowski
Cori Hayden
David Schloen
Dean R. Snow
Eric C. Kansa
Eric C. Kansa
Eric C. Kansa
Eric Kansa
Eric Kansa
Francis P. McManamon
Geoffrey C. Bowker
George P. Nicholas
Jennifer Trant
Karl-Heinz Lampe
Keith Kintigh
Kimberly Christen
Margie M. Burton
Martin Doerr
Martin Doerr
Michael Brown
Robin Boast
Sarah Whitcher Kansa
Sergey Brin
Tim Brody
Timothy J. Barringer
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

GIANT: Scalable Creation of a Web-scale Ontology

Author: Adomavicius Gediminas
Brin Sergey
Cordeiro Mário
Devlin Jacob
Doddington George R
Fader Anthony
Frantzi Katerina
Grishman Ralph
Ji Heng
Koo Terry
McClosky David
Mihalcea Rada
Pasca Marius
Pawar Sachin
Ritter Alan
Sha Lei
Smirnova Alisa
Witten Ian H
Witten Ian H
Zhang Ziqi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2020
Field of study

Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online population. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from a vast volume of web documents and search click graphs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our graph-neural-network-based techniques used in GIANT, and evaluate the proposed methods as compared to a variety of baselines. GIANT has produced the Attention Ontology, which has been deployed in various Tencent applications involving over a billion users. Online A/B testing performed on Tencent QQ Browser shows that Attention Ontology can significantly improve click-through rates in news recommendation.Comment: Accepted as full paper by SIGMOD 202

arXiv.org e-Print Archive

Crossref

Uso de ontologías para la mejora de resultados de motores de búsqueda web

Author: Bernard Jansen
Craig Silverstein
Dennis Wackerly
Dulce Aguilar-López
Dulce Aguilar-López
Erik Selberg
Gerard Salton
Jaime Bocio
Jorge Morato
Kevin Droegemeier
Mariano Fernández-López
Mark Chignell
Michael Lesk
Michel Dumontier
Mingxia Gao
Natalya Noy
Prasanna Ganesan
Rahul Ramachandran
Sergey Brin
Siddharth Patwardhan
Thomas Gruber
Publication venue: 'Ediciones Profesionales de la Informacion SL'
Publication date
Field of study

Crossref

Extracting patterns and relations from the world wide web

Author: Sergey Brin
Publication venue
Publication date: 01/01/1998
Field of study

Abstract. The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many di erent formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author,title) pairs from the World Wide Web.

CiteSeerX

Crossref